Hands-on_Ex03

Hands-on Exercise
Author

Shermainn

Published

April 29, 2025

Modified

April 30, 2025

1. Overview of Hands-on Exercise 3

  1. I will be learning how to create interactive data visualization by using functions provided by ggiraph and plotlyr packages.
  2. I will also be learning how to create animated data visualization by using gganimate and plotly r packages. In addition, I will be able to reshape data using tidyr package, and (ii) process, wrangle and transform data by using dplyr package.

1.1 Getting Started: Interactive Data Visualization

1.1.1. Install & Launch R packages

Install and Launch the following R packages:

  • ggiraph for making ‘ggplot’ graphics interactive.

  • plotly, R library for plotting interactive statistical graphs.

  • DT provides an R interface to the JavaScript library DataTables that create interactive table on html page.

  • tidyverse, a family of modern R packages specially designed to support data science, analysis and communication task including creating static statistical graphs.

  • patchwork for combining multiple ggplot2 graphs into one figure.

pacman::p_load(ggiraph, plotly, 
               patchwork, DT, tidyverse)

1.1.2. Import Data

The code chunk below read_csv() of readr package will import the Exam_data.csv and save as exam_data as a tibble data frame.

exam_data <- read_csv("data/Exam_data.csv")

1.2 Getting Started: Animated Data Visualization

1.2.1. Install and launch R packages

Install and Launch the following R packages:

  • plotly, R library for plotting interactive statistical graphs.

  • gganimate, an ggplot extension for creating animated statistical graphs.

  • gifski converts video frames to GIF animations using pngquant’s fancy features for efficient cross-frame palettes and temporal dithering. It produces animated GIFs that use thousands of colors per frame.

  • gapminder: An excerpt of the data available at Gapminder.org. We just want to use its country_colors scheme.

  • tidyverse, a family of modern R packages specially designed to support data science, analysis and communication task including creating static statistical graphs.

pacman::p_load(readxl, gifski, gapminder,
               plotly, gganimate, tidyverse)

1.2.2 Import Data

Import data worksheet from GlobalPopulation Excel workbook.

col <- c("Country", "Continent")
globalPop <- read_xls("data/GlobalPopulation.xls", sheet="Data")%>%
  mutate_each_(funs(factor(.)), col) %>%
  mutate(Year = as.integer(Year))

2.1 Interactive Data Visualization

2.1.1 ggiraph Methods

  • Tooltip: a column of datasets that contain tooltips to be displayed when the mouse is over elements

  • Data_id: a column of datasets that contain an id to be associated with elements.

  • Onclick: a column of datasets that contain JavaScript function to be executed when elements are clicked.

2.1.1.1 Tooltip effect

There are two steps that are needed (Step 1 and 2),

  1. Interactive version of ggplot2 is used to create the basic graph
  2. girafe() is then utilized to generate an svg object to be displayed on an html page
  3. Customize tooltip style
  4. Display statistics on tooltip
p <- ggplot(data=exam_data, 
       aes(x = MATHS)) +
  geom_dotplot_interactive(
    aes(tooltip = ID),
    stackgroups = TRUE, 
    binwidth = 1, 
    method = "histodot") +
  scale_y_continuous(NULL, 
                     breaks = NULL)
p#

By using the tooltip effect, there is interactivity by hovering the mouse pointer on an data point of interest where the information listed such as the student’s ID will be displayed.

We are able to display multiple information such as Name, Class, Race and Gender on tooltip as shown in the code chunk below.

exam_data$tooltip <- c(paste0("Name =", exam_data$ID, "\n Class =", exam_data$CLASS, "\n Race =", exam_data$RACE, "\n Gender =", exam_data$GENDER))

p <- ggplot(data=exam_data, 
       aes(x = MATHS)) +
  geom_dotplot_interactive(
    aes(tooltip = exam_data$tooltip), 
    stackgroups = TRUE,
    binwidth = 1,
    method = "histodot") +
  scale_y_continuous(NULL,          
                     breaks = NULL)

girafe(
  ggobj = p,
  width_svg = 6,
  height_svg = 6*0.618
)

One example uses opts_tooltip() of ggiraph by adding in css declarations such as changing the background and font colours.

tooltip_css <- "background-color:white; #<<
font-style:bold; color:black;" #<<

p <- ggplot(data=exam_data, 
       aes(x = MATHS)) +
  geom_dotplot_interactive(         
    aes(tooltip = ID),              
    stackgroups = TRUE,             
    binwidth = 1,                   
    method = "histodot") +          
  scale_y_continuous(NULL, breaks = NULL)

girafe(                             
  ggobj = p,                        
  width_svg = 6,                    
  height_svg = 6*0.618,
  options = list(    #<<
    opts_tooltip(    #<<
      css = tooltip_css)) #<<
)                               

Statistics such as the 90% confident interval of the mean can be computed and displayed as shown in the code chunk below.

tooltip <- function(y, ymax, accuracy = .01) {
  mean <- scales::number(y, accuracy = accuracy)
  sem <- scales::number(ymax - y, accuracy = accuracy)
  paste("Mean maths scores:", mean, "+/-", sem)
}

gg_point <- ggplot(data=exam_data, 
                   aes(x = RACE),
) +
  stat_summary(aes(y = MATHS, 
                   tooltip = after_stat(  
                     tooltip(y, ymax))),  
    fun.data = "mean_se", 
    geom = GeomInteractiveCol,  
    fill = "light blue"
  ) +
  stat_summary(aes(y = MATHS),
    fun.data = mean_se,
    geom = "errorbar", width = 0.2, size = 0.2
  )

girafe(ggobj = gg_point,
       width_svg = 8,
       height_svg = 8*0.618)

2.1.1.2 data_id aesthetic

The code chunk below shows the hover effect that data_id can shown as one of the interactive features of ggiraph.

p <- ggplot(data=exam_data, 
       aes(x = MATHS)) +
  geom_dotplot_interactive(
    aes(tooltip = ID),
    stackgroups = TRUE, 
    binwidth = 1, 
    method = "histodot") +
  scale_y_continuous(NULL, 
                     breaks = NULL)
girafe(                                 
  ggobj = p,                            
  width_svg = 6,                        
  height_svg = 6*0.618                  
)  
p <- ggplot(data=exam_data, 
       aes(x = MATHS)) +
  geom_dotplot_interactive(           
    aes(data_id = CLASS),   #default value of hover css fill is orange          
    stackgroups = TRUE,               
    binwidth = 1,                       
    method = "histodot") +              
  scale_y_continuous(NULL,              
                     breaks = NULL)

girafe(                                 
  ggobj = p,                            
  width_svg = 6,                        
  height_svg = 6*0.618                  
)                                        
p <- ggplot(data=exam_data, 
       aes(x = MATHS)) +
  geom_dotplot_interactive(             
    aes(data_id = CLASS),              
    stackgroups = TRUE,                 
    binwidth = 1,                       
    method = "histodot") +              
  scale_y_continuous(NULL,              
                     breaks = NULL)

girafe(                                 
  ggobj = p,                            
  width_svg = 6,                        
  height_svg = 6*0.618,
  options = list(                       
    opts_hover(css = "fill: #202020;"), 
    opts_hover_inv(css = "opacity:0.2;")
  )                                     
)                                        
p <- ggplot(data=exam_data, 
       aes(x = MATHS)) +
  geom_dotplot_interactive(         
    aes(tooltip = CLASS, 
        data_id = CLASS),           
    stackgroups = TRUE,             
    binwidth = 1,                   
    method = "histodot") +          
  scale_y_continuous(NULL,          
                     breaks = NULL)

girafe(                             
  ggobj = p,                        
  width_svg = 6,                    
  height_svg = 6*0.618,
  options = list(                   
    opts_hover(css = "fill: #202020;"),  
    opts_hover_inv(css = "opacity:0.2;") 
  )                                 
)                                   

2.1.1.3 Onclick

This provides hotlink interactivity on the web when using the onclick argument of ggiraph where there is a web document link with a data object displayed on the top right hand corner of the figure upon mouse click.

exam_data$onclick <- sprintf("window.open(\"%s%s\")",
"https://www.moe.gov.sg/schoolfinder?journey=Primary%20school",
as.character(exam_data$ID))

## click actions need to be a "str" column containing javascript instructions

p <- ggplot(data=exam_data, 
       aes(x = MATHS)) +
  geom_dotplot_interactive(             
    aes(onclick = onclick),             
    stackgroups = TRUE,                 
    binwidth = 1,                       
    method = "histodot") +              
  scale_y_continuous(NULL,              
                     breaks = NULL)

girafe(                                 
  ggobj = p,                            
  width_svg = 6,                        
  height_svg = 6*0.618)                                        

2.1.1.4 Coordinated Multiple Views with ggiraph

  • Use interactive functions of ggiraph such as data_id aesthetic to link observations and tooltip aesthetic to hover over a point with a mouse

  • Combine it with patchwork learned in Hands-on Exercise 2

p1 <- ggplot(data=exam_data, 
       aes(x = MATHS)) +
  geom_dotplot_interactive(              
    aes(data_id = ID),              
    stackgroups = TRUE,                  
    binwidth = 1,                        
    method = "histodot") +  
  coord_cartesian(xlim=c(0,100)) + 
  scale_y_continuous(NULL,               
                     breaks = NULL)

p2 <- ggplot(data=exam_data, 
       aes(x = ENGLISH)) +
  geom_dotplot_interactive(              
    aes(data_id = ID),              
    stackgroups = TRUE,                  
    binwidth = 1,                        
    method = "histodot") + 
  coord_cartesian(xlim=c(0,100)) + 
  scale_y_continuous(NULL,               
                     breaks = NULL)

girafe(code = print(p1 + p2), 
       width_svg = 6,
       height_svg = 3,
       options = list(
         opts_hover(css = "fill: #202020;"),
         opts_hover_inv(css = "opacity:0.2;")
         )
       ) 

2.1.2 plotly Methods

There are two ways to use plotly:

  1. using plot_ly()
  2. using ggploty()
plot_ly(data = exam_data, 
             x = ~MATHS, 
             y = ~GENDER,
            colour = ~RACE)

From using the functions subplot() and highlight_key(), I am able to compare results of students’ scores for Math, Science and English. I am also able to pinpoint any student by click on a data point of any one of the scatterplots to see the students’ scores.

  • hightlight_key() is used to share data and creates an object of class crosstalk

  • subplot() helps to place plots side by side

d <- highlight_key(exam_data)

p1 <- ggplot(data=d,
            aes(x = ENGLISH,
                y = SCIENCE)) +
  geom_point(size=1) + 
  coord_cartesian(xlim=c(0,100), 
                  ylim=c(0,100))
p2 <- ggplot(data=d, 
             aes(x = ENGLISH,
                y = MATHS)) +
      geom_point(size=1) +
      coord_cartesian(xlim=c(0,100), 
                  ylim=c(0,100))

subplot(ggplotly(p1),
        ggplotly(p2))

2.1.3 crosstalk Methods

Crosstalk is an add-on to the htmlwidgets package. It extends htmlwidgets with a set of classes, functions and conventions for implementing cross-widgets interactions (currently, linked brushing and filtering).

  • A wrapper of JavaScript Library DataTables

  • Data objects in R can be rendered as HTML tables using JavaScript library “DataTables” via R Markdown or Shiny.

DT::datatable(exam_data, class = "compact")

Code chunk below is used to implement the coordinated brushing.

  • highlight() sets a variety of options for brushing (i.e. highlight) multiple plots. It is primarily designed to link multiple plotly graphs together and may not behaved as expected when linking plotly to another htmlwidget package via crosstalk. Some cases such as persistent selection in leaflet, other htmlwidgets will respect the options.

  • bscols() is a helper function of crosstalk by putting HTML elements next to each other. It can be called directly from the console but is designed specifically for R Markdown.

d <- highlight_key(exam_data)

p <- ggplot(d, 
            aes(ENGLISH, MATHS)) +
    geom_point(size=1) +
  coord_cartesian(xlim=c(0,100),
                  ylim=c(0,100))

gg <- highlight(ggplotly(p), 
                "plotly_selected")

crosstalk::bscols(gg, DT::datatable(d), widths = 5)

2.2 Additional Plot: Interactive map of Singapore with ggiraph

Reading layer `MP14_PLNG_AREA_WEB_PL' from data source 
  `C:\Users\user1\Downloads\MasterPlan2014PlanningAreaBoundaryWebSHP\MP14_PLNG_AREA_WEB_PL.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 55 features and 12 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
[1] "2024"
# A tibble: 6 × 5
  `Planning Area` Subzone Age   Sex     `2024` 
  <chr>           <chr>   <chr> <chr>   <chr>  
1 Total           Total   Total Total   4180870
2 Total           Total   Total Males   2034960
3 Total           Total   Total Females 2145900
4 Total           Total   0     Total   29250  
5 Total           Total   0     Males   14970  
6 Total           Total   0     Females 14280  
[1] "Planning Area" "Subzone"       "Age"           "Sex"          
[5] "2024"         
 [1] "OBJECTID"   "PLN_AREA_N" "PLN_AREA_C" "CA_IND"     "REGION_N"  
 [6] "REGION_C"   "INC_CRC"    "FMEL_UPD_D" "X_ADDR"     "Y_ADDR"    
[11] "SHAPE_Leng" "SHAPE_Area" "geometry"  
# A tibble: 6 × 2
  planning_area resident_population
  <chr>                       <dbl>
1 Ang Mo Kio                1276360
2 Bedok                     2216060
3 Bishan                     703930
4 Boon Lay                       40
5 Bukit Batok               1343100
6 Bukit Merah               1187950
ggplot(singapore_map) +
  geom_sf(aes(fill = resident_population), color = "white") +
  scale_fill_viridis_c(option = "plasma") +
  theme_minimal() +
  labs(title = "Singapore Population by Planning Area (2024)", fill = "Population")

#Area data
singapore_map$area_km2 <- as.numeric(singapore_map$SHAPE_Area) / 1e6

singapore_map$area_cat <- ntile(singapore_map$area_km2, 5)
singapore_map$area_cat <- factor(singapore_map$area_cat)

#Tooltip
singapore_map$tooltip <- paste0(
  singapore_map$PLN_AREA_N, " — ",
  round(singapore_map$area_km2, 2), " km²"
)
#ggiraph
gg <- ggplot(data = singapore_map) +
  geom_sf_interactive(
    aes(fill = area_cat, tooltip = tooltip),
    color = "white"
  ) +
  scale_fill_brewer(palette = "Blues") +
  theme_minimal() +
  labs(fill = "Area Size (quintiles)",title = "Singapore Population by Planning Area (2024)")

girafe(ggobj = gg)

3. Animated Data Visualization

3.1 Terminology

From this visualization type, we need to understand some key concepts and terminology used in this type of visualization.

  1. Frame: In an animated line graph, each frame represents a different point in time or a different category. When the frame changes, the data points on the graph are updated to reflect the new data.

  2. Animation Attributes: The animation attributes are the settings that control how the animation behaves. For example, you can specify the duration of each frame, the easing function used to transition between frames, and whether to start the animation from the current frame or from the beginning.

3.2 gganimate Methods

gganimate brings your static ggplot2 plots to life, turning them into animations. Some key components to note are explained simply below:
Imagine you’re animating a bouncing ball with ggplot2:

  • transition_time() decides when and where the ball moves (frame by frame).

  • view_follow() makes the camera follow the ball.

  • shadow_mark() shows the ball’s trail as it bounces.

  • enter_bounce() makes the ball bounce into view.

  • ease_aes() makes the motion look smooth and natural — not robotic.

3.2.1 Static bubble plots

  • transition_time() of gganimate is used to create transition through distinct states in time (i.e. Year).

  • ease_aes() is used to control easing of aesthetics. The default is linear. Other methods are: quadratic, cubic, quartic, quintic, sine, circular, exponential, elastic, back, and bounce.

ggplot(globalPop, aes(x = Old, y = Young, 
                      size = Population, 
                      colour = Country)) +
  geom_point(alpha = 0.7, 
             show.legend = FALSE) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
  labs(title = 'Year: {frame_time}', 
       x = '% Aged', 
       y = '% Young') 

ggplot(globalPop, aes(x = Old, y = Young, 
                      size = Population, 
                      colour = Country)) +
  geom_point(alpha = 0.7, 
             show.legend = FALSE) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
  labs(title = 'Year: {frame_time}', 
       x = '% Aged', 
       y = '% Young') +
  transition_time(Year) +       
  ease_aes('linear')          

3.2.2 plotly Methods

In Plotly R package, both ggplotly() and plot_ly() support key frame animations through the frame argument/aesthetic. They also support an ids argument/aesthetic to ensure smooth transitions between objects with the same id (which helps facilitate object constancy).

gg <- ggplot(globalPop, 
       aes(x = Old, 
           y = Young, 
           size = Population, 
           colour = Country)) +
  geom_point(alpha = 0.7) + ## aes(frame & size) not working w ggplot anymore
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
  labs(x = '% Aged', 
       y = '% Young')

ggplotly(gg)
gg <- ggplot(globalPop, 
       aes(x = Old, 
           y = Young, 
           size = Population, 
           colour = Country)) +
  geom_point(alpha = 0.7) + # aes(size = Population, frame = Year) not working
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
  labs(x = '% Aged', 
       y = '% Young') + 
  theme(legend.position='none') #removes legend

ggplotly(gg)
bp <- globalPop %>%
  plot_ly(x = ~Old, 
          y = ~Young, 
          size = ~Population, 
          color = ~Continent,
          sizes = c(2, 100),
          frame = ~Year, 
          text = ~Country, 
          hoverinfo = "text",
          type = 'scatter',
          mode = 'markers'
          ) %>%
  layout(showlegend = FALSE)
bp